PYMNTS Intelligence Banner June 2024

CrowdStrike Outage Rolls On; Attention Turns to Software Update Quality Control

The CrowdStrike-Windows outage story continued to play out in airports, online and in stores Monday (July 22) with the focus now turning to the security of what used to be routine software updates.

Both Microsoft and CrowdStrike, as well as other companies that were victimized by the outage, issued updates during the day Monday as they slowly completed the manual processes necessary to fix problems caused by a faulty software update executed Friday. CrowdStrike in particular added some color to its initial report that a software update to its Falcon Sensor caused the crash that sent 8.5 million Windows users headed for alternate devices, if they were available.

“On July 19, 2024 at 04:09 UTC, as part of ongoing operations, CrowdStrike released a sensor configuration update to Windows systems,” the company posted. “Sensor configuration updates are an ongoing part of the protection mechanisms of the Falcon platform. This configuration update triggered a logic error resulting in a system crash and blue screen (BSOD) on impacted systems.”

The company also indicated in a separate post that it has been a community effort to get Windows-based systems up and running. It said that together with its customers it has tested a new technique to accelerate impacted system remediation, which is highly detailed and technical on its site.

To a non-developer’s eye, all the techniques look to be a variation on manually patching the software update and manually rebooting the system.

Read more: CrowdStrike Aftermath: Five Things You Need to Know

Digital Disconnection, Operational Unraveling

Microsoft also announced its own workaround with VP of security David Weston posting: “We’re working around the clock and providing ongoing updates and support. Additionally, CrowdStrike has helped us develop a scalable solution that will help Microsoft’s Azure infrastructure accelerate a fix for CrowdStrike’s faulty update. We have also worked with both AWS [Amazon Web Services] and GCP [Google Cloud Platform] to collaborate on the most effective approaches.”

All of which might be too late for passengers on Delta. On Monday, Delta and its regional affiliate Endeavor accounted for the vast majority of canceled U.S. flights, which had mostly recovered their schedules. Delta CEO Ed Bastian told several news media outlets that it would take a “another couple of days” to get all its operations up and running smoothly.

As the main drama faded, the industry continued to look within for the preemptive strategies that would possibly stop a CrowdStrike type of outage again.

For example Finexio CEO Ernest Rolfson told PYMNTS that his company — which stresses security in its AP/AR automation platform offering — is seeing heightened concern from current and prospective clients about resilience and fraud detection. It’s even seeing increased concern around paper check and invoice fraud, a trend he said started seeing a few weeks before the CrowdStrike outage.

“You need to have a multilayered payments infrastructure,” Rolfson said. “You need many form factors and many different options. You need to have trusted third parties to track and verify and validate what you’re doing on a consistent repeatable process. Have someone else come in and do the audits. Most folks are not doing that.”

Read also: Microsoft Outage Could Produce ‘Insurance Catastrophe’

Rolfson emphasized the critical importance of quality control in software updates, drawing from his own company’s experiences and expressing empathy for companies like Microsoft and their vendors, noting the difficulties inherent in such tasks.

He cited an example from earlier this year when one of the world’s largest banks, a Finexio partner, experienced a bug that affected several of its customers. However, Rolfson was taken aback by the timing of a recent software update from a Finexio partner. The update was rolled out during the workweek, in the morning — a move he found unconventional.

Typically, updates are scheduled after hours or on weekends to minimize disruptions, given the fact that best practices suggest staggering the release to avoid widespread issues if problems arise.

Read more: CrowdStrike Outage Hits Amazon at a Key Moment for Shopper Loyalty