HomeTECHNOLOGYData Centres: How Operators Can Reduce The Risk Of Failure

Data Centres: How Operators Can Reduce The Risk Of Failure

Even if data centres are becoming more and more reliable, there are still failures, shows a recent survey by the Uptime Institute among operators. What the default balance look like in 2021.

The Uptime Institute regularly surveys international data centre operators about the extent, causes and effects of failures in their data centres. The good news is that data centres are becoming more and more reliable overall, according to the 2021 study. Because the scope of IT activities is growing much faster than the number of registered failures, the causes and effects of failures should be carefully considered. Also, to take measures to minimize risk.

Almost A Third Of The Data Centres Were Without Failure In 2021

When asked about their largest failure in the last three years, 31 per cent of the participating data centre operators stated that they had not recorded any failure during this period (an improvement of 22 per cent over the previous year). The operators classified the severity of the 69 per cent of those surveyed who were affected by failures as follows: Minor 30 per cent, minimum 26 per cent, significant 24 per cent, severe 12 per cent and critical 8 per cent.

Overall, more than half (56 per cent) of all business interruptions could be resolved quickly and inconspicuously in 2021. However, the remaining 44 per cent of the failures had negative consequences for the reputation and considerable financial damage.

The Downtime Costs For Data Centre Operators Are Increasing

The surveys show that the cost of downtime has risen steadily over the past few years. In 2021, 39 per cent of respondents said the cost of their outages was less than $ 100,000. In 47 per cent, however, they ranged between 100,000 and 1 million US dollars, in 15 per cent even over a million US dollars. It is noteworthy that a few large outliers every year are so costly that they can distort the overall picture because the costs involved add up to several million or even tens of millions. In our 2019 survey, there were ten major incidents with losses above $ 25 million; three in 2020 and six in 2021.

Failure Cause 1: Problems With The Power Supply

When asked about the main causes of major failures (excluding minimal and minor), 43 per cent of data centre operators reported problems with the power supply. Three other causes are particularly worrying, each accounting for 14 per cent: cooling system failure, software / IT system errors, and network problems. All other causes of failure are rare, although the frequency of problems with third-party providers is creeping up – for example, with software-as-a-service, hosting or public cloud offers.

Procedural Flaws Encourage Human Error

The Uptime Institute also asked data centre operators whether they have had outages in the past few years in which human error played a role. Twenty-one per cent of those questioned answered in the negative. For the remaining 79 per cent, the causes were split up: 48 per cent incorrect execution by data centre employees, such as non-compliance with procedures, 41 per cent inappropriate employee processes, 36 percent problems during commissioning, 22 per cent failures in data centre planning, 20 percent problems with preventive maintenance and 18 per cent inadequate staff – multiple answers were possible here.

Data Centres: Transparency Gap For Public Cloud Providers

More and more companies are shifting parts of their IT workloads to the public cloud. They are thus making themselves dependent on the measures that cloud providers take to ensure failure safety – concerning architecture, availability, or management processes. The Uptime Institute asked these companies about the transparency of their cloud providers. The fact that business-critical workloads are increasingly being shifted to public clouds suggests that these cloud customers consider the level of transparency sufficient. However, a quarter of respondents are reluctant to move critical workloads to public clouds – but would likely do so if there was greater transparency around resilience.

Systematically Assess And Minimize Risks

Suppose companies want to reduce the risk of failure for their data centres or the services of their data centre service providers. In that case, they must first obtain an overview of the specific services and the associated risks as objective and systematic as possible. From the internal perspective of the data centre operator, this is often difficult. Important risk factors and weak points, for example, in the data centre system, infrastructure or the procedures and processes, are easily overlooked. Therefore, it pays to use specialized service providers, including the Uptime Instituteheard, to consult. Ideally, these service providers assess the risks according to standardized neutral procedures and thus arrive at a clear assessment and classification because data centre operators want to avoid damage in the double-digit million range just as urgently as their customers.

Also Read: Composable Infrastructure: More Agility For IT