Target audience: Larger enterprises with different monitoring systems.
The biggest problem in a large (enterprise) IT infrastructure is bringing logic (read: sense) to all the Events and Alerts (E&A) being generated by the different systems and monitoring systems.
Below a poll from a large network management software vendor
To many events and alerts (false positives) will reduce the effectiveness of IT operations. Important events or alerts can and will be overlooked.
Two major approaches are possible to reduce the noise in events and alerts:
- Forward all events to a central collector and start filtering and alerting from that point.
- Pre filter the events and alerts (tune the pre monitoring system) and forward important E&A to the E&A master collector.
Forward all events to a central collector
This approach looks easy but in moves the process of effective alerts (tuning) to a central location. We will go into detail in a separate article.
Pre filter Events and Alerts
The different IT teams should work together and forward events and alerts to the master alert monitor (with dashboard) only when there is something really wrong and needs attention or an intervention. In fact, there are two points where the important Events and Alerts are process: decentral and central.
The primary monitor should filter out all the noise. Effective alerting is being created based on the way a monitoring has been put in place. In a network management system, you always have latency. It is like measuring the temperature of a human being to get an indication of fever. The temperature depends on the place of the thermometer and what type of thermometer you are using. By definition a plain monitor is not calibrated (you need corrections and an offset for the right environment). Every monitoring system will generate false positives because the system is not aware of the environment (infrastructure). Based on the knowledge and experience of the professionals an alerting system will be tuned (calibrated) to tune the monitoring system to match the environment. Events and alerts will then give valuable information without the noise.
Organisational impact: The primary filter (pre filter) is being built by the team that is responsible for the specific part of the IT infrastructure.