Keeping IT Up

Whether it’s a business website, an Intranet application, a set of micro-services or a large infrastructure set-up, it’s essential that those services stay up and running, available to your collaborators or customers. TenTwentyFour¹⁰²⁴ offers monitoring services on both our own or your pre-existing infrastructure – to not only keep an eye on any critical services, but also intervene and solve or mitigate the problems as – or even before – they occur.

Examples of systems that we monitor range from simple checks on your website to make sure it’s available, responsive and the underlying framework or CMS is up to date, over detection of DDoS or brute-force attacks on your application’s authentication, all the way to watching multiple system and application vitals distributed across a larger server infrastructure.

TenTwentyFour¹⁰²⁴ uses a three-fold monitoring set-up for Log-Management, Trending, and Alerting. Log-Management and Trending allow us to detect problems that lurk in the future, why Alerting will notify us immediately should part of a system spontaneously enter a critical state.

Let us advise you in which components of your system are most critical to assess, before you – as our customer – decide which and what number of metrics to check if, and how fast we should intervene, and whom we should notify in which delays if the state of one of your systems become problematic or even critical.

Red Alert!

We check several system vitals on each host to make sure the server is up, running and healthy.

The TenTwentyFour¹⁰²⁴ control post runs Icinga2 to make sure all monitored systems and services are nominal.

Several availability, health and performance checks are either defined manually or automatically defined and deployed through our configuration management utility and then run against the respective systems every few minutes. Such checks can be quite basic or more complex, if required. We always start by checking whether the systems are reachable over the network and might end up, for instance checking when the latest record was written to a specific table in a database.

With hundreds of checks already made available by the Free Software community, we’ve already got some ground covered, but for anything that needs checking and doesn’t yet have a ready-made plug-in or utility, we create custom check scripts to cover all the bases.

Custom-defined thresholds allow us to precisely specify when a service enters a problematic state and when this state becomes critical. As with trending, reacting quickly, as soon as services leave their nominal state, allows us to intervene pre-emptively and take counter-measures.

Whenever Icinga2 detects a service leaving its nominal state, TenTwentyFour¹⁰²⁴ is notified through several independent communication channels, allowing us to react as soon as possible. You – as our customer – may wish to have notifications go to us first and only escalated to your own IT department after some hours, or the other way around, depending on your preferred SLA.

Keeping Book

Centralised log-management is another important pillar of watching over your infrastructure. Most services on your infrastructure already log detailed information about anything that happens and especially about out-of-the-ordinary incidents. Why ignore that data, when you have a treasure-trove right under your nose?

Keeping your logs all in one place, with mighty tools to aggregate and analyse them.

If you manage a single server instance, you could always log into your server and grep through your logfiles. However, imagine you have dozens, if not hundreds of servers, how should you keep an eye on all those logs? How to spot the one entry that gives away a security issue? How will you access and analyse your logs to determined what happened when your server becomes unreachable? Which logs will you analyse in the – worst case – scenario where your server has been compromised and the attacker has deleted all logs to covers their tracks?

This is where centralised log-management comes into play.

TenTwentyFour¹⁰²⁴ uses Graylog2 to ship logs from all its servers and some critical services to a central Graylog/Elasticsearch cluster which aggregates and indexes the log entries. With all your log entries in one, easily searchable index, we can set up dashboards to detect trends and special events in log files effortlessly.

For instance, at TenTwentyFour¹⁰²⁴, one dashboard displays the rate of emails rejected directly by our greylists, or determined to be SPAM/HAM, while also displaying recipient email addresses who receive most of the spam emails in a pie-chart graph.

Additionally, instead of alerting from system checks only, we can now leverage the data that gets naturally and periodically logged to detect anomalies and thus again create alerts that notify us whenever specific events are logged or services – that were expected to – fail to do so.

Contact us to discuss your monitoring need and together let us come up with a detailed monitoring plan and a Service Level Agreement (SLA) that fit your needs.

Monitoring

Dashing Dashboards

Red Alert!

Keeping Book