Wednesday, November 26, 2008

Deconstructing Process Isolation

This paper adds on to the afore-mentioned Singularity paper by presenting comparative costs of Singularity's selective use of hardware isolation. In a number of benchmarks, the number of cycles was very close for all the setups with SIPs and HIPs running in Ring 0; for the most part, the Ring 3 HIPs performed comparably as well, except for the small handful in which each process ran in an unprivileged non-kernel domain in Ring 3 and encountered 10x the number of cycles. This latter setup is apparently the one used in most hardware-isolated microkernel systems.

Singularity: Rethinking the Software Stack, Hunt & Larus

The Singularity project was started to re-examine design decisions of modern operating systems. The project consists of a microkernel operating system, C#-based programming language, and verification tools meant to address such shortcomings as security vulnerabilities, lack of robustness, and unexpected application interactions and failures. The three major features of Singularity are software-isolated processes (SIPs), use of contract-based channels for communication, and manifest-based programs (MBPs). SIPs were designed such that they could be easily garbage-collected; SIPs can only have pointers to their own memory or their memory on the exchange heap.

The authors place a huge amount of importance on safe, verified code, leading them to create Sing#. The overhead associated with run-time checking of the language accounts for an additional 4-5% of cycles, but the authors deem that an acceptable trade-off. It was interesting to note that the manifest-based configuration code syntax seems to be approaching Perl with its multiple dollar signs and forall keyword. Another interesting point was the distrust of the compiler Bartok and its ability to compile MSIL assuredly into safe native code.

Monday, November 24, 2008

Two-Phase Commit

This looks like another one of those awesome '80s database papers. Woo, R*! I think I'll pass.

** Update (Nov. 29, 2010): For some reason this blog entry seems to be attracting lots of traffic. Sadly, I don't have a review to offer of this paper, but I can share a quote and some advice for you if you want to learn more about Two-Phase Commit.

"Two-phase commit: isn't that just like a simplified Paxos?" My advice: read the original Two-Phase Commit paper by Mohan, Lindsay, and Obermarck. Then read Lamport's "Paxos Made Simple." Then re-read "Two-Phase Commit" and enjoy how easy the algorithm is.

Paxos Made Simple, Lamport

Paxos is a distributed consensus algorithm for a network of processes in which each process plays the role of proposer, acceptor, and learner. The algorithm is carried out in two phases; in the first phase, a proposer selects a proposal number and sends a prepare request to some majority of acceptors. An acceptor will only accept this proposal if it has not seen a larger proposal number, and if it accepts, it responds with the highest-numbered proposal it has previously seen. Before sending this response, the acceptor records its intended response to stable storage so that consensus can still be reached in case of failure.

In the second phase, the proposer awaits responses to its prepare requests. If a majority of the acceptors respond, then the proposer subsequently sends out an accept request for the value of the highest-numbered response it received. Each acceptor then accepts the value sent to it unless it has already responded to a prepare request for a larger value. The purpose of these carefully designed phases is to enforce the rule that only one value will be chosen solely from proposed values. The possibility of failure of one acceptor created a need for multiple acceptors, which then created a need for reaching consensus among these multiple acceptors. This paper's abstract was an effective attention-grabber. I can only imagine how painful the original paper was when Paxos was not explained in simple terms.

Wednesday, November 19, 2008

Implementing Declarative Overlays

P2 is a system used for the declarative construction of networks. P2's language OverLog can be used to specify overlays very concisely, requiring only 16 rules to express a Narada-style network and 47 for a Chord overlay. However, the conciseness of OverLog does not necessarily speak for the learnability of the language. Better readability (a la SQL) would help its popularity, particularly since OverLog's declarative nature is already somewhat foreign to the typical imperative-language programmer.

P2's modularity is useful in that unlike PIER, P2 can be used with a variety of underlying networks, building up a knowledge of the network from a small set of nearby neighbor addresses. Its architecture is based on Click, though P2 differs from Click in that P2 elements are meant to act as database operators rather than packet routing functions. Performance testing of P2 Chord overlays revealed expected performance on par with those of Chord itself. The results of hop-count, latency, and performance under churn were "acceptable" as the authors didn't expect to compete with hand-tuned implementations.

Friday, November 14, 2008

K42: Building a Complete Operating System

This paper discusses K42, a highly-customizable object-oriented operating system. K42 was designed to address some technology predictions made by the authors, including the rise of multiprocessors and 64-bit machines. One of the goals of K42 was the ability to scale up with multiprocessors, which was achieved through minimizing sharing using techniques like Protected Procedure Calls and clustered objects, and affected the design of memory management. K42's design places a lot of importance on locality and customizability; the latter feature is a really interesting one in that K42 allows each instance of a resource to be managed by different sets of object instances.

Also supported is dynamic customization through hot swapping and dynamic upgrade. K42 reduces system calls and storage/pinned memory in the kernel by moving thread-scheduling into user-space. However, this user-level scheduling creates problems when using pthreads. Though Windows was predicted to dominate the market, the authors decided to provide Linux compatibility in K42.

Wednesday, November 12, 2008

Lessons from Giant-Scale Services, Brewer

This paper discusses giant-scale services and their advantages, as well as maintaining high availability. The giant-scale services mentioned here are single-site, single-owner, well-connected clusters and provide ubiquitous infrastructure that allow users to access services and centralized data from multiple devices. The paper discusses load management and high availability as major factors in the design of giant-scale services.

The availability metrics of uptime, yield, and harvest, as well as the DQ principle are brought up in deciding whether replication or partitioning is a better scheme for increasing availability. The author agrees with the traditionally held viewpoint that replication is the better strategy, although he suggests that some combination of partitioning and replication could give finer recovery control. Good strategies for graceful degradation, disaster tolerance, and online evolution are crucial for high availability. The author mentions that he has developed tools to help design giant-scale systems, but the paper delves more into metrics and high-level topics instead of said tools.

Monday, November 10, 2008

The Impact of DHT Routing Geometry on Resilience and Proximity

This paper discusses DHTs and compares the performance of different routing designs. The paper focuses on routing geometry rather than algorithmic details. Flexibility, that is, the amount of freedom available in choosing neighbors and next-hop paths, is a direct consequence of the geometry and in turn affects such properties as static resilience, path latency, and local convergence. Geometries that have little optimal path route selection flexibility are tree, butterfly, XOR and hybrid geometries, while hypercube and butterfly geometries have little flexibility in neighbor selection.

Based on path latency experiments, the atuhors suggest that geometries only actually affect path latency based on their ability to implement Proximity Neighbor/Route Selection. The authors tentatively suggest that the comparatively simple ring geometry be used more frequently because it is flexible and performs just as well or better than other geometries in a variety of tests. It seems odd that the authors should sound so tentative about using ring geometries, seeing as Chord had already been successfully implemented prior to this paper's publishing.

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

Chord is a scalable lookup protocol that supports the operation of mapping a given key onto a node. It is fully distributed and attempts to load balance by acting as a distributed hash function. Nodes only need to keep track of a few successor nodes on the circle, which they maintain in a finger table. Chord allows for a dynamic network; it correctly maintains each node's sucessor and ensures that a node is responsible for every key. It uses stabilization to keep nodes' successor pointers correct but also allows for lookup retries in the event that stabilizing has not happened recently enough. This paper contained a lot of theoretical analysis about Chord in addition to simulation/experimental results, and the authors use these analyses to back up the idea that Chord functions correctly even in the face of incorrect or failed nodes.

Wednesday, November 5, 2008

The Click Modular Router

Click is a flexible software architecture for creating routers. These routers are built from components called elements, which represent small units of router processing and are implemented as C++ objects. Click routers support both push and pull connections but with carefully specified rules as to which output and input types can be connected to explicit Queue elements.

Click runs as a single-thread in the Linux kernel and supports handlers and hot-swapping for changing configurations. It uses polling instead of interrupts and achieves a much higher max loss-free forwarding rate than Linux with interrupts. Click supports extensions such as varying types of packet schedulers and dropping policies. The Click language paired with small, modular elements makes it simple for users to create quick configurations, but a few additions would be helpful such as the use of configuration variables to prevent the user from having to copy information manually.

Monday, November 3, 2008