The Grad School Saga Continues...: December 2009

Wednesday, December 9, 2009

BotGraph: Large Scale Spamming Botnet Detection; Zhao, Xie, Yu, Ke, Yu, Chen, & Gilumm

This paper describes BotGraph, a system to detect Web-account abuse traffic generated by bots. It attempts to determine the botnet on the collective scale, not determining individual nodes, by taking advantage of similarities in configuration among nodes in a botnet. Detecting this abuse consists of detecting aggressive signups and logins.

Detecting botnet signups is based on spikes of signups from a given IP address. Locating all bots in a bot-user group is done through use of a graph with users as vertices and weighted edges between them showing their activity similiarity. This paper uses a DryadLINQ-based system to process large amounts of data in parallel for these user-user graphs. A nice thing about this paper is that if the botnets adaptively limit their signups or e-mails sent per bot, even if the graph doesn't pick them up as being bots, their activities will have been sufficiently scaled back that they don't pose nearly the problem they did previously.

Sunday, December 6, 2009

Not-a-Bot: Improving Service Availability in the Face of Botnet Attacks; Gummadi, Balakrishnan, Maniatis, & Ratnasamy

Botnets are responsible for a large portion of all undesirable web traffic, such as spam, DDoS attacks, and click-fraud, which can be reduced with effective schemes to correctly classify traffic as human-generated or bot-generated. However, these schemes must not trust potentially-infected hosts and must be as transparent as possible to users. This paper describes Not-a-Bot (NAB), a human-activity attester that requires both a Trusted Platform Module chip and a verifier module implemented in the Xen hypervisor at any server wishing to use it. NAB attempts to reduce bot-generated traffic while not affecting human-generated traffic.

An interaction is attested to be human-generated based on how closely recent activity from the keyboard and mouse ports matches the potential web traffic. If matching recent activity cannot be found, an alert is issued that the activity could not be attested. Bots could replicate this user behavior in their messages to be sent, but NAB rate-limits this activity by requiring application-specific time between attestations. The TPM on a host then signs and sends the request on, where it can be checked at the destination verifier.

The authors evaluate their scheme on actual traces of several hundred users, as well as malware traces and spam messages. They found that NAB does very well in suppressing spam, DDoS attempts, and click-fraud by around 90%, and it didn't deny any human-generated traffic.

In reading about the scenarios in which NAB is actually applied, I wondered how many average e-mail users would actually want to deploy NAB. The authors then addressed this question by saying that e-mail users would benefit from having all their mail correctly classified. However, I personally don't feel that I'm largely affected by mis-classification of my e-mails, so I still have a hard time accepting that NAB could catch on widely. The DDoS scenario is less effective in justifying use of NAB, particularly since some legitimate web requests could be machine-generated. As for detecting click-fraud, any company getting ad revenue could make use of NAB, but the benefits for users is low; I particularly disliked the idea of having an attester bundled with installed toolbars.