Microsoft’s Global IT Outage: Strategies to manage IT downtime

  • 19 Jul 2024
  • Rebecca
thumbnail-microsoft.jpg

On 18th July at 18:00 ET, a mass global IT outage caused by a defect in a Windows content update hit businesses worldwide forcing banks and media broadcasters offline and grounding flights.

The outage resulted in worldwide travel disruption that has delayed flights in many countries, temporarily forced broadcasters offline, caused delays with global port and rail transport, the failure of payment systems, and, in Alaska, interruption of the 911 emergency systems.

Crowdstrike, a software provider for Microsoft, issued this statement on their website:

CrowdStrike is actively working with customers impacted by a defect found in a single content update for Windows hosts. Mac and Linux hosts are not impacted. This is not a security incident or cyberattack. 

The issue has been identified, isolated and a fix has been deployed. We refer customers to the support portal for the latest updates and will continue to provide complete and continuous updates on our website.”[1] 

This global outage underlines how reliant organizations are on technology to carry out business as usual and, because the outage was due to a software update provided by a supplier, the challenges organizations face with third party suppliers

The BCI and the British Computer Society (BCS) joint report on Service Resilience and Software Risk 2023 revealed how the risk from software failure was a hurdle to national resilience  and how there is insufficient shared understanding of the actual and potential risk of software failures and their impact. Indeed, the BCI Horizon Scan Report 2023 showed that the greatest single disruption for organizations in the past twelve months was IT and telecom outages, and that the shift to remote and hybrid working emphasised the need to implement mitigation strategies to deal with them.

To mitigate the effects of IT outages, practitioners should conduct an audit of ICT systems, and the critical processes and systems reliant on them, to uncover their organization’s challenges in the face of technology failure or cyber-attack, and partner with top management to ensure shared understanding of ICT risks in order to adopt adequate policies, budget, and processes to prepare for software failures.

Practitioners can also look to regulation pertaining other sectors such as the Digital Operational Resilience Act (DORA) which focuses on digital third-party suppliers in order to prevent and manage disruption of entities in the financial sector. Although this is an EU regulation, practitioners could extract strategies to mitigate and manage IT disruption, aligning with good practice.

This global IT outage highlights the reliance organizations have on their suppliers. BCI research[2] highlights the risk of reputational damage from third party suppliers. Indeed, the fallout from this event will inevitably cause reputational damage for Microsoft and CrowdStrike which will require a robust reputational resilience strategy.

The BCI Cyber Resilience Special Interest Group supports cyber resilience and invites subject matter experts to share their insights and offer guidance to organizations. The goal is to leverage collective experience and explore new concepts to improve the field of cyber resilience. Practitioners are welcome to join this group on LinkedIn.

More on
About the author