Outages were felt by nearly everyone reminding us in dramatic fashion just how fragile the internet really is.
Written By: Gary Hauser,
Principal Architect & Head of the Sales Architectures Team for Copper River IT
Many people who know their tech history know that the internet was originally designed to survive a nuclear war (ARPANET). On Tuesday, August 12th, 2014, at five o’clock in the morning, Eastern Standard Time, network mailing-lists started filling up with notifications of outages and unreachable sites around the world. Not exactly nuclear war, but the event did remind us in dramatic fashion just how fragile the internet really is.
The smart people weren’t taken by surprise. The events of August 12 did not manifest an unknown problem; the issue had been pointed out in theoretical white papers on how to bring the internet down by academics and by members of the network community who maintain slightly older equipment in locations around the net.
An Issue of Maps and Scale
The issue, simply put, is one of destinations on a map and the maps ability to scale. The Border Gateway Protocol (BGP) routing tables that make up the maps of all of the destinations on the Default Free Zone (DFZ) at the top tiers of the internet ran out of space to support all of those destinations. Much of the older (three to five+ year old) equipment that is still in use today was designed based on the best technology available at the time, with the primary idea that the internet Core DFZ would never reach the 512K mark. As we know, things have changed dramatically since then. We now have space for 4,000,000,000 (that’s four billion) different Autonomous System numbers in the ASN tagging scheme which makes up the AS Path through the internet to your final network destination. Think of the AS Path as a list of highways that you would need to take in order to reach the off-ramp exit for the local destination of your choice. There were approximately 48,333 of these ASN highways currently in use on the AS Path list as of August 12th, 2014. At 4:50am EST the BGP Routing table, or list of destination networks or off-ramps reachable via all those highways, went up to 515,500 destinations for approximately seven minutes from the point of view of a Level 3 BGP border peering router in Chicago.
As this occurred, a multitude of slightly older routers and switches around the world that had well known destination forwarding table limits of 512k started dropping prefixes and spreading that change to other devices. This event was caused by a change in the de-aggregation level of the destinations coming from two of Verizon’s networks: AS701 and AS705. Those two highways briefly started announcing a 10k to 15k increase in destination prefixes until they discovered and fixed the issue. At any given time, approximately 2100+ AS’s are unreachable because of misconfiguration or other standard provisioning related reasons. During this routing table expansion, approximately 6,000+ AS’s on the internet, or over 12% of the highways, were unreachable. Those are significant and scary numbers.
In a Perfect World …
This serious glitch needn’t have happened. Based on the natural destination table growth that we have been seeing on the internet for the last few years, the destination route entry overload was expected to occur, along with its troublesome effects, sometime in the August to October time frame of this year. If everyone had played by the rules and operated as a best-practices citizen, we would be under 300,000 destinations and still be fully reachable. But the internet, like the world it envelops, is at the mercy of imperfect politics and operators who are scrambling to deal with the size and scale of their own networks so we had better be prepared for more of these jolts.
Help is on the Way
That said, there are solutions and workarounds for this problem. If you have a Cisco-based router or switch exhibiting these hiccup symptoms, the following document describes ways that might resolve this issue in the short term:
Similar fixes are available for other vendors but are not yet publicly available. You can contact Copper River for help solving these support issues and implementing the above-mentioned Cisco solutions, or to identify and mitigate the possible impact of this kind of event on your network.
To Panic or Not to Panic
Just one more point to consider: this was a natural one-off—a non-malicious occurrence, generated by one service provider’s (Verizon’s) operational issue. On the other hand, researchers have predicted this type of situation could emerge as a potential attack on the internet itself.
For more information on this week’s internet event, please see the following links:
What caused today’s Internet hiccup:
Internet hiccups today? You’re not alone. Here’s why.
For more detailed information about the hardware issue that was the cause of the outage portion of this event please see the following link:
IPv4/IPv6 and TCAM memory:
For more information about the attack based vulnerabilities that could bring down the internet and how to mitigate or reduce them please see the following articles:
The cyber weapon that could take down the internet
How to crash the internet
BGP Spoofing — why nothing on the internet is actually secure
How the Internet went out in Egypt
Help is here. Call Copper River to discuss how we can help you face this and many other internet hiccups.
About Gary Hauser:
Mr. Gary Hauser has been working with large scale Service Provider and Commercial Data centers for over 28 years with his experience spanning every aspect of the design, implementation, configuration, and support of complex, mission-critical network infrastructure solutions for Global Fortune 100 companies and telephone companies in Europe, Asia as well as North and South America.
His capabilities include an extremely broad knowledge base and familiarity with the latest cutting-edge technologies, including the follow areas: Carrier Class Routing, Switching, MPLS transport, Network Virtualization, Software Defined Networking and Optical technologies.
Gary joined Copper River’s team as the Principle Architect in 2013 and in 2014 was tasked with starting a new sales architecture team in charge of delivering on the business promise of the virtual private cloud and infrastructure as a service by helping customers understand the services requirements planning for the cloud.
Gary’s holds multiple technical certifications to include JNCIE-SP (#12), JNCIE-ENT (#25), CCIE (#4489, Ret.), and he is an IPv6 Forum Gold/Silver Certified Engineer. Gary’s educational background has earned him his Bachelor of Science degree in Business Marketing and Management from American Intercontinental University and his Associate of the Arts in Architectural Design from Anne Arundel Community College.