According to Tom’s Guide, which tracked 75 outages across nearly 50 services in 2025, the year was defined by massive infrastructure failures. The single largest event was a more than 12-hour Amazon Web Services (AWS) outage on October 21, which affected the US-East-1 hub and potentially over 1,000 apps, from Snapchat to Delta Airlines, due to a DNS issue in its DynamoDB service. OpenAI’s ChatGPT suffered 11 outages, including a brutal 10-hour global blackout on June 10. Cloudflare had two major incidents, with a November 18 outage taking down nearly a quarter of all websites for six hours after a database change caused internal software to fail. Other lengthy disruptions included a 24-hour PlayStation Network outage starting February 2 and a multi-day X (Twitter) interruption in May after an Oregon data center fire.
The Backbone Problem
Here’s the thing that 2025 made painfully clear: we’ve built a digital world on astonishingly few, incredibly centralized pillars. When AWS or Cloudflare sneezes, the entire internet gets a cold. And I mean that literally—the Cloudflare crash in November even took down Downdetector, the site we all frantically refresh to see if it’s just us. The technical explanations are always some variation of a “cascading failure.” A tiny permission change in a database at Cloudflare doubles a file size. A DNS “address book” inside AWS forgets where servers live. These aren’t exotic cyber-attacks; they’re often mundane operational errors that spiral because the systems are so complex and interconnected. There’s no easy fix, either. The efficiency and scale we get from these cloud giants are why they exist. But the trade-off is a breathtaking single point of failure.
AI and the New Fragility
If AWS represents the old-guard infrastructure fragility, OpenAI’s ChatGPT represents the new. Eleven outages in ten months? That’s basically a monthly reliability crisis. It shows how the breakneck pace of deploying and scaling complex AI models is running headlong into the old-fashioned need for stable, resilient systems. Think about it: when a search engine goes down, you grumble and use another one. When *the* dominant conversational AI goes down for 10 hours, entire workflows for millions of people and businesses just stop. There’s no easy alternative. This isn’t just a server problem; it’s about the stability of a new layer of the internet stack that’s becoming as essential as electricity for some. And OpenAI’s typical opacity about causes doesn’t inspire confidence that the problem is being holistically solved.
When Outages Get Physical
The most jarring outages are the ones that leap from our screens into our physical lives. The Garmin watch bug in January wasn’t just an app glitch—it bricked the GPS on devices people rely on to navigate hikes or track serious workouts. The string of payment service outages (Apple Pay, PayPal, Zelle) hits people in the wallet, disrupting everything from splitting a lunch bill to paying rent. The Zelle outage in May, tied to a third-party processor, showed how the “it’s just software” illusion vanishes when you can’t access your own money. And let’s not forget the PlayStation Network’s 24-hour marathon. For a service people use for entertainment, a full day is an eternity. Sony’s compensation of “5 extra days” of a subscription feels pretty weak when you’ve lost your entire Friday night gaming session.
The Opaque Accountability Game
What’s almost as frustrating as the outages themselves is the communication—or lack thereof. Cloudflare, to its credit, published a detailed blog post explaining the technical misstep. But they’re the exception. Google engineers once suggested users reboot their *routers* during a Workspace outage—a classic “have you tried turning it off and on again” move that screams “we have no public clue either.” X (Twitter) is famously secretive, making its two-tweet acknowledgment of the data center fire, including one from the @XEng account, a minor miracle. So what’s the takeaway? We’re more dependent than ever on systems that fail in unpredictable ways, run by companies that often don’t feel obligated to tell us why. As AI gets woven deeper into everything, expect this tension between innovation and instability to get worse, not better.
