Monitoring¶

Monitoring the server farm happens on 3 different layers:

System monitoring
Log monitoring
Availability monitoring

All these layers are fully automated and they will raise alerts when required. However, looking into the monitoring data manually is also possible but limited to permitted users only.

System monitoring¶

System monitoring is looking into all system components such as - CPU - Load - RAM - Disks - Processes - Swap - Network Traffic - Entropy

and others. We are using NetData for this which runs on each host individually.

A dashboard is available at http://hostname:19999 and access is limited to the IP address of Paragon's head office as this system does not yet support other means of authentication. This is subject to change in the future and we will then offer access to other users as well.

Log monitoring¶

In the server farm there are plenty of log files being provided such as - syslogs - Apache logs - PHP logs - Auth logs

and many more, often even application specific logs, e.g. detailed information about each request that the proxy gateway receives and handles.

All that log data is being collected by FluentD, forwarded to ElasticSearch for indexing and automatically monitored by ElastAlert. Manual access to the log archive is possible through Kibana and is possible through https://logserver, protected by username and password.

Availability monitoring¶

Websites and other services accessible over TCP/IP are monitored from different hosts located all around the world running Uptime to check if each of them is accessible and delivers the expected response like e.g. http response code or certain content determined by regular expressions.

Access to the Uptime Dashboard is permitted for Paragon DevOps only.

Those checks are configured in the inventory and currently two different instances are supported:

Drupal domains¶

Those are configured together with the Drupal settings and the uptime check parameters look like these:

drupal_settings:
  - id: example
    root: /web
    ...
    domains:
      - domain: www.example.com
        ...
        uptime:
          name: Label to identify this check instance
          tags:
            - sample
          pollerParams:
            match: /Text to expect on the site/
            mattermost_hook: [URL of the webhook]

Other services¶

For domains or services that should be checked but are no Drupal sites, we can defin extra check instances somewhere in the inventory like this:

uptime_domains:
  - url: status.linode.com
    uptime:
      name: Linode Status
      path: history.rss
      tags:
        - linode
        - tracker
      pollerParams:
        tracker_source: RSS
        mattermost_hook: [URL of the webhook]

Any number of other services can be defined that way.