Hosting and Maintenance¶
This document gives a comprehensive overview of all important concepts, components and tools in use. As you will see one of our key expertise is to automate monitoring to provide an excellent quality concerning stability.
This document is also linked to the document ALM, where you find similar philosophies and principles.
The following features explain our environment in a more detailed fashion.
Infrastructure as Code¶
This is one of our key concepts. All our code for creating our infrastructure and tools is stored in Ansible scripts in our GitLab. These scripts can always be executed through jobs in a build pipeline. It is also a key part of our recovery concept described later in this document.
By using sensors and heartbeats, we can monitor a host by checking:
- If the host is running
- Memory Usage
- Disk I/O
We use the tool Netdata, several thousand metrics are monitored every second, e.g.:
As you can see in the picture above, we collect the logs from every host and from every application in use. This is a huge amount of data, so we use Elasticsearch for analysis. This analysis can also result in an alert, which is sent to our alerting system described below.
With Netdata and logging in place we can provide a comprehensive analysis of all our components. It allows us to retrace problems or even avoid problems, which might appear in the future.
As seen in the picture this component has four incoming channels:
- Result of the Log Analysis
- Host sensors
- Host heartbeats
This component is critical because it acts as a single pool of alerts for the DevOps staff. We want to avoid duplications to get a clear overview of what has happened.
Automatic Ticket Creation¶
Our alert deduplication component will create tickets in GitLab automatically, when something goes wrong. This allows us to react as fast as possible to take care of the potential problem.
CI / CD Pipelines and Communication¶
The tasks defined in the pipelines can be triggered from various sources, like: * Schedule, e.g. running every night * Action of a developer in Git (our VCS) * Command from the communication platform. We use the ChatOps feature in Mattermost.
The tasks are developed once to satisfy our goal of DRY (don't repeat yourself). These tasks can be used from different triggers.
Result of CI / CD Pipelines¶
The pipelines have three main goals:
- Provisioning: Covers all the tools of the host
- Maintenance: Covers the configuration of the applications
- Deployment: Install and update the application
Manage Configuration in Repository¶
Configuration of a host is managed by the Git repositories. This allows us to easily keep track and recover the system. This concept is explained in the next chapter.
The following components are backed up by the 3-2-1 philosophy and can be recovered.
Note: The DB and User Data recovery duration cannot be determined due to the amount of data and network speed.
|Repo / Versioning||GitLab|
|Provisioning / Deployment / Maintenance||Ansible|
|Sensors||Netdata and Beats|