This paper discusses Resilient Overlay Networks (RON), which was created as a solution to the connectivity problems created by BGP's taking too much time to recover from faults. RON nodes continually monitor path (or virtual link) quality between themselves and other nodes and exchange this information amongst themselves to determine optimal routes based on some quality metric.
As different applications have different route needs in terms of latency, loss, and throughput, each group of clients (program communicating with node's RON software) can use their own metrics to determine paths. The entry node of a packet into the network identifies the needs of the incoming flow and selects a path from its routing table; it tags the packet with a flow ID and sends it on its way. Subsequent RON nodes can use flow ID tags to speed up the sending process for future packets.
RON uses link-state routing, where each node in the overlay needs to send information to every other node, which seems like a lot of wasteful overhead. However, since the problem being tackled deals with convergence time, RON floods are perhaps more desirable than lack of connectivity in the network. The three metrics a RON nodes keeps track of for each link are latency, loss, and throughput, and nodes frequently probe virtual links to ensure they're still usable.
In evaluating RON, the authors found that in one scenario RON was able to fully overcome outages (defined as packet loss rates above some given threshold for a certain length of time) and up to 60% of outages in the second set, the difference for which the authors don't discuss much, brushing it off as differences in the setup and also claiming that the 40% were due to sites not being reachable by any other RON site. It's not clear to me that the problem of connectivity posed by BGP's slow recovery justifies the use of RON, which will not scale well and also might not play well with NATs.
No comments:
Post a Comment