Monday, September 21, 2009

VL2: A Scalable and Flexible Data Center Network; Greenberg et. al.

VL2 is another network architecture striving to tackle the problem of providing Ethernet semantics for increasingly larger and more unwieldy networks. VL2 is set apart from other attempts such as PortLand and SEATTLE by its dedication to achieving agility, "the capacity to assign any server to any service," a goal which the authors can break down as 3 objectives: uniform high traffic flow between any 2 servers, isolation of effects on traffic from services, and Layer-2 properties. Modifications are made to the end hosts, not to the underlying network itself, and consists of currently existing technology (ECMP forwarding, IP any/multicasting, link-state routing).

VL2 takes an approach similar to PortLand that involves separating names from locations, with each switches and servers using IP addresses as location-specific addresses (LA) and applications having their own application-specific addresses (AA) that do not change, allowing continued service in the face of virtual machine migration. A smaller cluster of directory servers store lookup information for AA to LA. Each server is outfitted with a VL2 agent that intercepts its host's packets and either queries the directory for an AA-to-LA mapping or provides one from its mappings cache and send the packet to the TOR with the corresponding LA.

VL2 was created specifically with data centers in mind and uses a Clos network, whose inherent layout provides redundancy to combat component failures. Targeting a specific realm instead of general networks seems to be a good path to take in the networking world where there are always tradeoffs for networks with different needs and purposes, and VL2 seems to be no exception. As data centers are already designed with special optimizations (e.g., large chunk sizes to amortize disk seeks), it makes sense that a network architecture could be developed with their performance needs in mind as well.

Unfortunately, it appears that traffic patterns are often very random with given paths being very short-lived, due to the practice of randomly distributing files and workloads over the data center, so it's more difficult to try to come up with traffic flow enhancements. VL2 pairs Valiant Load Balancing (VLB) with TCP to achieve fairness and high utilization. VLB is realized by randomly selecting the intermediate switch through which to send traffic, and TCP is required to deal with the scenario that different-sized flows aren't distributed well (causing congestion at some links). VLB's randomization also helps provide isolation, so applications that would otherwise flood the network have their loads spread evenly across the network instead. Valiant Load-Balancing also uses randomization to determine paths, which does not lead to best-case performance and would work better for supporting general applications.

The evaluation was run on an actual prototype setup, though as a workload, the authors examined a worst-case scenario: all-to-all shuffle. VL2 does seem to provide fairness, isolation, and high goodput, but perhaps more promising is that for the same cost as an over-subscribed traditional network, this Clos network can be built without over-subscription.

No comments: