Supervision consists of visualizing and managing all the information related to the platform opertion. The aim is to anticipate any disruptions which could affect the equipment, infrastructures and applications. With these tools, we are able to check the operations of each element to identify and anticipate the incidents which may be caused by equipment or application failure.
Zabbix is an enterprise-class monitoring platform we use for our real-time monitoring for all of our customers. It enables us, from an internal point of view, to follow the bandwidth, the use of the resources, the availability of the VMs, etc., and to trigger alerts based on the thresholds we configured.
When incidents occur, alerts are directly sent to engineers in duty for a high reactivity. All detections of downtime during non-working hours are noticed by automatic phone calls to dedicated engineers on duty.
To summarize, Zabbix is used for supervising all the aspects of the infrastructure and related services to ensure an effective and efficient monitoring for a rapid recovery.
All areas ranging from network and routing topology to server hardware
From errors on switch ports to connection pool utilization in Java application servers
Day-to-day metrics monitoring of disk space, websites status, and keys services availability
Through notifications or alerts (text messages and automatics phone calls)
Tracking of customers' servers ability to use maintenance free proxy servers and networks
Site 24x7 is used as a secondary tool to support operations performed by Zabbix. This tool enhances the monitoring system by achieving deeper analysis on areas where Zabbix cannot do. It provides credentials to further analyze why and how an incident occurs, and how to solve them. Site 24x7 pushes the web analysis further to understand why the service can be slow, not only down.