Monday, September 21, 2009

PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric

This paper discusses PortLand, an Ethernet-compatible set of routing, forwarding, and address resolution protocols intended for use in data centers. Data centers are scaling up at enormous rates and hence no longer play well with protocols intended for general LANs. To that end, the authors lay out some goals for this new protocol to meet: mobility, self-configuration, loop-free connectivity, and failure detection.

PortLand is specialized to a fat tree network topology. Having this specific layout in mind allows PortLand to use a hierarchical addressing scheme to encode location information as a pseudo-MAC address (PMAC). PortLand also uses name information in addition to location and stays with the tradition of using flat addressing (based on MACs), adding the notion of a PMAC address on top of that, which encodes the pod grouping of a host.

A centralized fabric manager maintains network configuration information and helps resolve IP-to-PMAC mappings. The danger in having this important function centralized is how FM failure is handled, though presumably it would be replicated to or spread across multiple nodes. Generally this setup prevents mass broadcasts of ARP packets, as ARPs are intercepted by edge switches and instead used to query the fabric manager, which only resorts to broadcast when it doesn't have the IP-to-PMAC mapping cached. Even more important is the amount of computation required of the FM when the number of ARPs sent out per host increases; up to 70 cores worth of required CPU are projected for large numbers of hosts each sending 100 ARPs/sec. As data centers are scaling out very quickly, it makes sense to watch out for this computation overload for the FM and perhaps develop it as a cluster earlier rather than later.

PortLand uses a Location Discovery Protocol to self-configure; switches send out Location Discovery Messages to one another, which allows them to infer information about their location in the tree based on messages they receive in certain ports. A very cool property about this protocol paired with the known fat tree structure is that it can actually pick out certain failure conditions like miswiring.

No comments: