Cloudflare Outages – The Major Incidents

Cloudflare, one of the internet’s most critical infrastructure providers, powers a massive portion of the web through its content delivery network (CDN), DDoS protection, DNS services (including the popular 1.1.1.1 resolver), security features like WAF and Bot Management, and more. When Cloudflare experiences issues, the ripple effects can be widespread, taking down or degrading access to thousands of websites, apps, and services that rely on it.In late 2025 and early 2026, Cloudflare faced several notable incidents that drew attention to the fragility of even the most robust global networks. These events, while not unprecedented in the industry, highlighted recurring themes: software bugs, configuration errors, and the challenges of scaling complex systems. As of February 4, 2026, no major global outages have been reported in the new year beyond some scheduled maintenance and minor regional issues, but the back-to-back problems in Q4 2025 prompted internal reviews and public explanations from the company.The Major Incident: November 18, 2025 OutageThe most significant recent disruption occurred on November 18, 2025, starting around 11:20 UTC. Cloudflare’s network experienced widespread failures in delivering core traffic, affecting a large swath of the internet. Services like websites, APIs, and even parts of platforms such as ChatGPT, YouTube, and various gaming services (e.g., League of Legends) became inaccessible or slow for users worldwide.Cloudflare’s detailed post-mortem, published by CEO Matthew Prince, explained that the root cause was not a cyber attack but an internal software issue. A change to database permissions caused the system generating a “feature file” for Bot Management to output duplicate entries. This doubled the file size unexpectedly. The oversized file propagated across the global network, and when proxies attempted to load it during staggered cycles (every five minutes), many failed due to exceeding hard-coded limits. This cascading failure took proxies offline progressively.The outage lasted several hours, with full recovery by around 17:06 UTC—roughly six hours of impact, though effects varied by region and service. It underscored how a seemingly minor permissions tweak in one system could amplify into a global event due to propagation and tight coupling in distributed architecture.Cloudflare emphasized transparency in their response, noting the incident affected a broad range of customers and apologizing for the disruption. External reports from sources like The New York Times and The Guardian described it as knocking major sites offline, with some users unable to access services for up to three hours in certain cases.Just Weeks Later: December 5, 2025 OutageLess than three weeks after the November incident, another outage hit on December 5, 2025, at 08:47 UTC. This one was shorter—resolved by 09:12 UTC, about 25 minutes—but still significant, impacting around 28% of Cloudflare’s HTTP traffic.The cause stemmed from changes to body parsing logic in an effort to mitigate a newly disclosed vulnerability in React Server Components (an industry-wide issue patched that week). An undetected code flaw, present for years, triggered under specific conditions during the update deployment. This led to partial network failures.Cloudflare’s post-mortem highlighted the incident’s brevity but acknowledged its severity for affected customers. It served as a reminder that even security-driven changes can introduce risks if edge cases aren’t fully covered.Other Notable Events in Late 2025 and Early 2026

  • June 12, 2025: Multiple services including Workers KV, Access, WARP, and the dashboard faced outages lasting up to 2 hours and 28 minutes. This earlier event affected developer tools and VPN-like services.
  • July 14, 2025: Issues with the 1.1.1.1 DNS resolver due to invalid routes from a configuration flaw, lasting over an hour and highlighting DNS propagation risks.
  • January 22, 2026 Route Leak: An automated routing policy configuration error leaked BGP prefixes from a Miami data center router. This caused traffic misrouting, impacting Cloudflare customers and external parties funneling through the location. The issue was manually reverted within minutes after detection, but it demonstrated BGP’s sensitivity to automation bugs.

Cloudflare’s Radar reports and quarterly summaries (e.g., Q4 2025 Internet disruptions overview) noted these alongside broader trends like cable cuts, power outages, and weather-related issues affecting global connectivity. In 2025, over 180 disruptions were tracked industry-wide, with Cloudflare incidents contributing to website and app unavailability.Cloudflare’s Response and “Code Orange: Fail Small”Following the November and December outages, Cloudflare declared a “Code Orange: Fail Small” initiative on December 19, 2025. This high-priority effort focused engineering resources on preventing recurrence of the root causes—specifically, oversized files, propagation failures, and undetected legacy bugs. The goal: make systems more resilient by failing in smaller, more contained ways rather than globally.The company has a strong track record of transparent post-mortems, which help the industry learn. These incidents weren’t due to malice or external attacks but internal complexity—a common theme in large-scale distributed systems.Broader Implications and LessonsThese events raise questions about centralization. Cloudflare serves a huge portion of web traffic, so its issues amplify widely. Critics point out that reliance on a few major providers (Cloudflare, AWS, Google Cloud, etc.) creates hidden single points of failure, even in a supposedly decentralized internet.For businesses and developers:

  • Diversify providers where possible (multi-CDN setups).
  • Implement fallback DNS, caching, or origin shielding.
  • Monitor status pages and third-party trackers like Downdetector.
  • Test resilience against upstream failures.