What's New in Version 0.0.4


First off, let me say a few things. When I started writing NetSaint six months ago, I had no idea that it would blossom into the creature that it has. If it were up to me, I probably would have stopped development after the 0.0.2 release. I have a fairly small LAN that doesn't have anywhere near the number of nodes or complexity that some of your networks do, so I don't necessarily need all the new features that are present. A lot of the features added in this release are the direct result of requests/ideas submitted to me by end users like yourself, and that keeps development going. If you like NetSaint, all I ask it that you continue to offer your support in the form of bug reports, suggestions for new featues, plugins, FAQs, short tutorials, etc. Anything that you can contribute to this project is much appreciated.

There are a lot of things that still need to be ironed out and improved upon - the documentation, plugins, and the cleanliness/optimzation of the code are a few things that come to mind. Given enough time, I will get around to improving these things. NetSaint consumes a lot of my time right now, and its not unusual for me to spend several hours each day working on it. The documentation alone takes as long or longer to put together than the actual code does. All that aside, I still think that NetSaint is a darn good software package right now, and it is only going to get better.

Anyway, enough jabber. Here are the major things that have been added and/or changed since the 0.0.3 release (and there are some major ones)...

New Features

  1. Notification Time Periods. This hasn't been requested by many people, but I know many of you will like it. Previous releases of NetSaint would send out notifications about service or host issues 24 hours a day, 7 days a week. While that was fine and good, I thought it would be nicer to have more control about when notifications went out. Time periods are defined in the host configuration file and allow you to specify multiple time ranges per weekday, on a one week rotating basis. Once you define the time periods that suit your needs, you must add them to contact, host, and service definitions. While this is going to mean tedious editing of the config file, it will give you a lot of flexibility for deciding who gets notifications when and about what. I would highly recommend reading the "Time Periods - How They Affect Notifications and Service Checks" document in the theory of operation section of the documentation, as it describes in more detail how time periods work and some pitfalls you might encounter if you use them. I was originally planning on adding this feature in 0.0.5, but I just couldn't resist.

  2. Service Check Time Periods. This feature has not been fully implemented yet, so don't expect it to work perfectly if you want to use it. After I added time period definitions to NetSaint, I thought it would be nice to have control over when a service can be checked. By assigning a specific time period to the new check_period argument in service definitions, you can control when NetSaint is allowed to monitor a particular service. When it comes time to reschedule a service check, NetSaint will check and see if the "requested" recheck time is allowed in the check period. If it isn't, NetSaint will reschedule the service check at the next available time slot in the time period. This may mean that the service may not get checked again for up to a week if you only allow the service to be checked for a short period on a single day of the week. Read the "Time Periods - How They Affect Notifications and Service Checks" document in the theory of operation section of the docs for more information on this. There are several things that you must be aware of if you choose not to monitor your services 24 hours a day, 7 days a week.

  3. Contact Groups For Services. Contact groups can now be assigned to specific services. Previous versions of NetSaint only allowed contact groups to be associated with host groups. The result of this is that contacts were notified about host and service issues, even though they might not be responsible for them. By assigning contact groups to services (using the new contact_groups argument in service definition), you have to ability to specify different sets of contacts that should notified for service and host problems. Let's say you are monitoring a particular application running on one of your hosts. The "admin" for that service doesn't necessarily need to be notified of problems with the host, as they may not have the ability, technical knowledge, or access rights needed to fix the problem. However, they do need to be notified of problems with the service itself. The addition of contact groups to specific services fixes this problem. Note: As a result of the addition of contact groups for specific services, the purpose of the contact groups specified in host groups have now been retasked. Host group contacts now only receive notifications about issues directly relating to hosts in the host group (when the hosts go down, become unreachable, or recover).

  4. Host Notification Options For Contacts. Contact definitions now allow you to specify what types of host issues the contact can be notified about. These include when a host goes down, becomes unreachable, or experiences a recovery from a previous problem.

  5. Host Dependencies. The ugly local router stuff present in the 0.0.2 and 0.0.3 releases has disappeared and has been replaced with a better method of checking host reachability. Remote hosts are now dependent on a "parent" host, which may be dependent on another host, etc. until you reach a point where the "grandfather" host is not dependent on anything because it is local to the host which is running NetSaint. Host dependencies are specified by the parent_host option in the host configuration file. More information on host dependencies and how they affect monitoring can be found in the "Determining Status and Reachability of Network Hosts" document of the theory of operation section.

  6. Agressive Host Checking Option. Beginning with release 0.0.4, NetSaint tries to be a little smarter about how and when it checks the status of hosts. In general, disabling this option will allow NetSaint to make some smarter decisions and check hosts a bit faster. Enabling this option will increase the amount of time required to check hosts, but may improve reliability a bit. If you want to know more about exactly what this option does, search the source code in the netsaint.c file for the string "use_agressive_host_checking" and read some of the comments I've added. Unless you have problems with NetSaint not recognizing that a host recovered, I would suggest not enabling this option.

  7. Retry Interval Option For Services. A new option (retry_interval) has been added to service definitions. This allows you to better control the rate at which services are re-checked when they first change to a non-OK state. Note: Unless you change the interval_length variable in the main configuration file, the minimum time you can specify for service retries is 1 minute. If you change the interval_length variable to a number less than 60 (seconds), you'll have to change all check and notification interval values in your services definitions as well.

  8. New Status Log Format. The format of the status log has changed for the better. It now records the status of all hosts, regardless of whether they are up, down, or unreachable. This can be handy if you want to scan the log file for information on how many hosts are down, etc.

  9. Plugins Have Been Split Off. For various reasons, I have decided to split the plugins off from the main distribution of NetSaint. Since they are still required to actually use NetSaint, you'll need to download them from www.netsaint.org

  10. New Directory Structure. Release 0.0.4 uses a new directory structure for installation. Read more about it in the installation instructions.