Tuesday, September 8, 2009

Congestion Avoidance and Control; Jacobson & Karels

This paper discusses congestion avoidance/control mechanisms for TCP, including "slow start" and "additive increase, multiplicative decrease," two methods that can be used alone or, as they actually are in practice, together. TCP Slow Start involves the use of "self-clocking," that is, sending out two packets for every ack received by the sender. This strategy performs better over the long run in terms of throughput and prevents multiple retransmissions of packets. An added advantage is that slow start is not a complicated algorithm to implement, and it's also easy to know how big of a buffer is needed at the bottleneck in the network based on window size.

Besides slow start, TCP also uses the congestion avoidance algorithm that is consistent with Jain/Chiu's recommendation: additive increase, multiplicative decrease. I thought this paper did a really nice job of offering the intuition for this combination, more so than the Jain/Chiu paper, which is that the network takes a greater hit from overshooting than from underuse; hence, caution is the best strategy, and the best performance can be had by probing in small increments to find available bandwidth and decreasing window size exponentially to account for all packets that will be queueing up. Using packet loss as a sign of network congestion to trigger the decreases of window size seems pretty reasonable; obviously it would be convenient for the hosts if the underlying network could just send a number back to each host specifying a concrete amount of bandwidth remaining for it to use.

This congestion avoidance is easily confused with slow start; when should the network use which? The appendices show how to combine the two: when a timeout occurs, we must decide which strategy to take. We engage in slow start (starting from a window size of one packet), and while the window size is less than half the size when the timeout occurred, double the window size with slow-start, and then probe incrementally. The intuition for this method is that half the timeout window size was probably a "safe" size within the bounds of our resources, so the network should ramp up to that point and then slowly and carefully probe for the max bandwidth in that range. (Ironically, "slow start" is the faster incrementing method in this case.)

Another change this paper recommends is a modification to the restransmit timer, namely taking into account the round-trip time variance to make a variable timer instead of a hard-coded one. The paper noted that exponential backoffs are the best all-around strategy, but what if we have more specific information about what's going on in the network? There are some other strategies such as "Fast Retransmit" that actually end up ignoring these timers when it becomes apparent that a chunk of packets have been dropped. It's possible in these cases that there are a number of gratuitous retransmits, so in the end there's a tradeoff between unnecessary packets in the network and fast recovery.

This paper is important because it formed the basis for the congestion avoidance used in TCP.

No comments: