The Problem

Software bugs in routers lead to network outages, security vulnerabilities, and other unexpected behavior. Rather than simply crashing the router, bugs can violate protocol semantics, rendering traditional failure detection and recovery techniques ineffective.

The Approach

Our approach is to eliminate bugs virtually (with use of virtualization technologies). We design a bug-tolerant router (BTR), which masks buggy behavior, and avoids letting it affect correctness of the network layer.

This is achieved through running multiple, diverse, instances of router software in parallel. The main question then becomes how is diversity achieved and how will it be deployed. First, to discuss the types of diversity that is possible:

  • Implementation - running software from different code bases (e.g., Quagga, XORP, Bird).
  • Version - running multiple versions the same code base (e.g., v0.98.6 and v0.99.11)
  • Update order/timing - Change timing of update messages.
  • Connection establishment order/timing - Changing the timing and order of connection establishment.
  • Protocol - run different protocols in parallel that will each chose the same route.
  • Execution environment - e.g., operating system, memory layout.

Different deployment scenarios are possible. An open source router vendor may run XORP, Quagga, and Bird together, which is sufficient. A closed source vendor may have various development trains internally or software acquired from an acquisition, and make use of version and data diversity. A network operator might run implementation diversity (using commercial router software), and protocol diversity, along with version and open source software.

The Prototype

We have built a prototype running on Linux that is transparent to peer routers (they do not know it's a BTR). We did not modify any router source code, instead, choosing to intercept all socket based communication (between peers, between router and kernel), and redirect it through our hypervisor which performs the voting and fault detection/recovery. We tested with Quagga, XORP, and Bird

Documents

Eric Keller*, Minlan Yu*, Matthew Caesar, Jennifer Rexford, Virtually Eliminating Router Bugs, 5th ACM International Conference on emerging Networking EXperiments and Technologies (CoNEXT),December 2009. (* indicates alphabetical order) (pdf)

Presentation at NANOG 46 in Philadelpha, PA. June 14-17, 2009. (ppt)

Matthew Caesar, Jennifer Rexford, Building Bug-Tolerant Routers with Virtualization, ACM SIGCOMM Workshop on Programmable Routers for the Extensible Services of Tomorrow (PRESTO),August 2008. (pdf)

Funding

NSF Logo The project is funded by NSF under grant CNS-0831646