CIOs are now seriously considering ways to avoid single points of failure and are re-evaluating their cloud strategies to prevent any future ‘blue screen incidents.
The disruption caused by the CrowdStrike software glitch, leading to a global outage of Windows systems, has sent shockwaves through the IT world. CIOs, are now reminded of the inherent risks associated with over-reliance on a single vendor, especially in the cloud.
The incident, saw IT systems crashing and displaying the “blue screen of death (BSOD),” exposed the vulnerabilities of heavily cloud-dependent infrastructures.
While it is being resolved, the potential for catastrophic consequences when a critical security component fails remains. CIOs are now forced to question the resilience of their cloud environments and explore new strategies.
Reevaluating cloud dependencies
Abhishek Gupta, CIO at DishTV is quoting as saying, “When an issue of such magnitude happens and causes such a big disruption, it is important and necessary to revisit your existing beliefs, decisions, and tradeoffs that went into arriving at the current architecture,”. “The outcome of the review may still be the same decision but necessary to review,” Gupta said, adding that DishTV is already re-evaluating its cloud strategy in a phased manner after the Crowdstrike incident.
Saurabh Gugnani, Director and Head of CyberDefence, IAM, and Application Security at Netherlands-headquartered TMF Group, added that a diversified approach to cloud strategies could mitigate such risks. “Yes, they [enterprises] should revisit cloud strategies. It has to be a mix of all the available solutions.”
A Few organizations have already started taking the leap of faith.
Shivkumar Borade, founder and CMD of Mytek Innovations, a victim of the BSOD effect stated “In response to recent disruptions affecting our critical operations, we have proactively updated our Business Continuity Plan to address unexpected downtimes and minimize the impact on productivity and service delivery,”. “Our revised plan includes enhanced communication management, featuring multiple layers to ensure all employees are well-informed about potential issues and their resolution.”
The company’s internal communication was significantly disrupted as its entire network, including Outlook, Teams, and SharePoint, is hosted on Microsoft 365.
“However, our in-house developed application remained unaffected due to GoDaddy’s use of its own hosting infrastructure,” said Borade. “We did experience issues with a few API integrations linked to the Azure platform, which were non-functional for the entire day. This disruption led to interrupted services for both our clients and users.”
Wake-up call for CIOs
Many CIO’s are primarily concerned with vendor lock-in. The reliance on a single cloud provider, as demonstrated by the CrowdStrike incident, creates a single point of failure. If a critical service from that provider is disrupted, it can have far-reaching implications for an organization. To mitigate this risk, CIOs are likely to explore multicloud or hybrid cloud architectures, distributing workloads across multiple platforms.
Allie Mellen, a principal analyst at Forrester, emphasized the critical nature of reliable tools and services in the face of cyber threats.
“Reliability of the tools and services cybersecurity teams use is critical in the face of cyberattacks,” Mellen stated. “An incident like this questions that reliability. This will undoubtedly raise questions and concerns from executives about how to ensure the reliability of enterprise systems, especially with technology as integrated into day-to-day operations as cybersecurity software.”
The incident exposed the fragility of cloud-dependent systems where a single point of failure can have cascading effects across an organization. Sunil Varkey, senior security professional and advisor at Beagle Security, noted, “Trust between cloud and security vendors is now questioned. This breach of confidence is likely to drive a higher emphasis on agentless solutions, which can offer enhanced security without the vulnerabilities associated with traditional agents.”
It is said to be one of the worst cybersecurity events considering the magnitude of the impact. The CrowdStrike incident affected computers running Microsoft Windows across various sectors, including airlines, banks, retailers, brokerage houses, media companies, and railways. The travel sector was notably impacted, with airlines and airports in Germany, France, the Netherlands, the UK, the US, Australia, China, Japan, India, Singapore, and Taiwan facing significant issues with check-in and ticketing systems, leading to flight delays and airport chaos.
Microsoft said around 8.5 million Windows computers were affected.
The impact was so much that SpaceX and Tesla CEO Elon Musk had to delete CrowdStrike from all its systems.
Enhanced risk management practices
The incident has highlighted the need for improved risk management practices. Enhanced due diligence, rigorous testing of updates, and phased rollouts are now critical.
“This incident serves as a wake-up call, emphasizing the need for continuous adaptation and improvement in cybersecurity practices across the industry,” said Gaurav Ranade, CTO at RAH Infotech.
D.R. Goyal, senior architect at Rakuten Symphony, advocated for a mechanism to test updates with select users before a full release: “It should have a mechanism to test with certain organizations with a set of users before releasing to the entire community and user base to reduce the impact.”
As the digital landscape evolves, ensuring the resilience of cloud-based systems is paramount. Ashis Guha, founder of An Idea Global Innovations, highlighted broader implications: “The incident has broader implications for the global economy; longer downtimes and recovery times will impact productivity and economics.”
Industry experts recommend several strategies for future preparedness, including phased rollouts, comprehensive testing, and robust backup systems.
Siddharth Ugrankar, Co-founder of Blockchain firm Qila, suggested that a phased deployment and thorough testing of updates could have mitigated the impact: “If CrowdStrike had deployed the update in a phased manner, the impact would have been far less.”
Moyukh Goswami, CTO at Nuvepro Enterprises believes aiming to prevent issues like the CrowdStrike debacle IT leaders should bolster their update management while enhancing testing protocols across diverse environments and implementing rigorous risk assessments, in addition to fortifying change management processes with robust governance frameworks, said
Goswami added “Strengthening monitoring capabilities, refining incident response plans tailored to update failures, and fostering proactive vendor relationships are crucial,” .
The CrowdStrike incident highlights the need for CIOs to revisit and fortify their cloud strategies. By implementing robust risk management practices, enhancing security measures, and diversifying cloud solutions, organizations can better protect themselves against future disruptions.
As the industry deals with the aftermath of this event, the focus should now to building resilient, adaptable, and well-tested cloud strategies to manage an ever increasing complex digital landscape.










