Wednesday, September 16, 2009

Floodless in SEATTLE: A Scalable Ethernet Architecture for Large Enterprises; Kim, Caesar, & Rexford

SEATTLE is an alternative architecture meant to combat some of the inherent problems associated with Ethernet. Using Ethernet, the control overhead in a network is proportional to the number of hosts in the network and requires the use of spanning tree for paths. Problems address then include large forwarding table size and control overhead, uneven link utilization due to root bridges and route spanning trees, poor throughput because of said spanning tree, and frequent flooding. One possible solution is to use a hybrid approach of connecting LANs via IP, each of which have IP subnets of up to a few hundred hosts. However, initial configuration is time-consuming, reconfiguring is required whenever the network design changes, and the ability of a host to move and still maintain its IP address isn't supported easily. Even using VLANs to allow mobility still has the problems of manual configuration and poor utilization of links due to a spanning tree.

To address all these problems, the authors present SEATTLE. SEATTLE has overhead and forwarding tables that are proportional to the number of subnets, not hosts. It uses a flat addressing scheme a la Ethernet, forwarding packets based on MAC addresses, but is scalable to large networks, as it uses a link-state protocol but only maintains switch information which is subject to less change and hence fewer control messages going out on the network. SEATTLE uses a DHT to lookup host information, which actually lengthens the path for at least the first traversal of a flow's path. Switches have the option of caching the host information themselves for forwarding based on shortest paths at the risk of ending up with a large, unwieldy forwarding table.

In case of network changes, SEATTLE attempts to avoid flooding whenever possible, choosing instead to unicast updates to other switches and even forward misdelivered packets on behalf of those other switches. SEATTLE employs a somewhat lazy approach in correcting these routing table entries.

In evaluating their design, the authors strove to use real data, an endeavor at which they partly succeeded, at least using a mix of real traces with artificially enhanced data. Running each experiment 25 times definitely seems like overkill to me; would the experiments differ much beyond 3 to 5 runs? Granted, the root bridge was randomly selected in these runs, but if network behavior would be that differently affected each time, that seems like a flaw in the design: not being able to predict the network behavior makes simple performance predictions near impossible, but maybe it's not any easier to do that with Ethernet.

There is a possibility of having unduly lengthened paths when a switch evicts a cached entry before its locally-attached hosts are done using the entry, in which case the switch forwards the packets to a resolver switch instead of queueing them itself. This scenario can be prevented by having a long timeout to eviction, with the drawback that the forwarding table could accumulate a very large number of entries.

SEATTLE addresses the biggest problems with Ethernet and hybrids involving it: use of a spanning tree for paths, frequent flooding, and required manual configuration for hybrid and VLAN schemes. However, I'm not convinced of the true need for SEATTLE; the only actual performance graphs included show control overhead, table size, and packet loss in the case of switch failures and moving hosts. The packet loss goes up to 30% as switch failure rate increases to 1 and only up to 5% for 200 hosts moving per second, which seems unreasonably high to me.

2 comments:

Adrienne said...

Their error bars are pretty huge in some cases, which is probably why they did so many trials. They may have run 3-5 trials and had each of the trials look different, so they re-ran it until they had a reasonable mean and then plopped huge error bars on the graphs.

Randy H. Katz said...

It should be clear from our discussions in class so far that network traffic can be highly variable, and that the error bars are going to be big.

By the way, I thought that the methodology, including implementation in XORP, was a strength of the paper.