Monitoring the server farm happens on 3 different layers:
- System monitoring
- Log monitoring
- Availability monitoring
All these layers are fully automated and they will raise alerts when required. However, looking into the monitoring data manually is also possible but limited to permitted users only.
System monitoring is looking into all system components such as - CPU - Load - RAM - Disks - Processes - Swap - Network Traffic - Entropy
and others. We are using NetData for this which runs on each host individually.
A dashboard is available at
http://hostname:19999 and access is limited to the IP address of Paragon's head office as this system does not yet support other means of authentication. This is subject to change in the future and we will then offer access to other users as well.
In the server farm there are plenty of log files being provided such as - syslogs - Apache logs - PHP logs - Auth logs
and many more, often even application specific logs, e.g. detailed information about each request that the proxy gateway receives and handles.
All that log data is being collected by FluentD, forwarded to ElasticSearch for indexing and automatically monitored by ElastAlert. Manual access to the log archive is possible through Kibana and is possible through
https://logserver, protected by username and password.
Websites and other services accessible over TCP/IP are monitored from different hosts located all around the world running Uptime to check if each of them is accessible and delivers the expected response like e.g. http response code or certain content determined by regular expressions.
Access to the Uptime Dashboard is permitted for Paragon DevOps only.
Those checks are configured in the inventory and currently two different instances are supported:
Those are configured together with the Drupal settings and the uptime check parameters look like these:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
For domains or services that should be checked but are no Drupal sites, we can defin extra check instances somewhere in the inventory like this:
1 2 3 4 5 6 7 8 9 10 11
Any number of other services can be defined that way.